Hope and Fear for Discriminative Training of Statistical Translation Models

نویسنده

David Chiang

چکیده

In machine translation, discriminative models have almost entirely supplanted the classical noisychannel model, but are standardly trained using a method that is reliable only in low-dimensional spaces. Two strands of research have tried to adapt more scalable discriminative training methods to machine translation: the first uses log-linear probability models and either maximum likelihood or minimum risk, and the other uses linear models and large-margin methods. Here, we provide an overview of the latter. We compare several learning algorithms and describe in detail some novel extensions suited to properties of the translation task: no single correct output, a large space of structured outputs, and slow inference. We present experimental results on a large-scale ArabicEnglish translation task, demonstrating large gains in translation accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulating Discriminative Training for Linear Mixture Adaptation in Statistical Machine Translation

Linear mixture models are a simple and effective technique for performing domain adaptation of translation models in statistical MT. In this paper, we identify and correct two weaknesses of this method. First, we show that standard maximumlikelihood weights are biased toward large corpora, and that a straightforward preprocessing step that down-samples phrase tables can be used to counter this ...

متن کامل

(Hidden) Conditional Random Fields Using Intermediate Classes for Statistical Machine Translation

One of the major components of Statistical Machine Translation (SMT) are generative translation models. As in other fields, where the transition from generative to discriminative training resulted in higher performance, it seems likely that translation models should be trained in a discriminative way. But due to the nature of SMT with large vocabularies, hidden alignments, reordering, and large...

متن کامل

Improved Discriminative Bilingual Word Alignment

For many years, statistical machine translation relied on generative models to provide bilingual word alignments. In 2005, several independent efforts showed that discriminative models could be used to enhance or replace the standard generative approach. Building on this work, we demonstrate substantial improvement in word-alignment accuracy, partly though improved training methods, but predomi...

متن کامل

Discriminative Alignment Training without Annotated Data for Machine Translation

In present Statistical Machine Translation (SMT) systems, alignment is trained in a previous stage as the translation model. Consequently, alignment model parameters are not tuned in function of the translation task, but only indirectly. In this paper, we propose a novel framework for discriminative training of alignment models with automated translation metrics as maximization criterion. In th...

متن کامل

Large-scale Discriminative n-gram Language Models for Statistical Machine Translation

We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchica...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 13 شماره

صفحات -

تاریخ انتشار 2012

Hope and Fear for Discriminative Training of Statistical Translation Models

نویسنده

چکیده

منابع مشابه

Simulating Discriminative Training for Linear Mixture Adaptation in Statistical Machine Translation

(Hidden) Conditional Random Fields Using Intermediate Classes for Statistical Machine Translation

Improved Discriminative Bilingual Word Alignment

Discriminative Alignment Training without Annotated Data for Machine Translation

Large-scale Discriminative n-gram Language Models for Statistical Machine Translation

عنوان ژورنال:

اشتراک گذاری